AITopics | Arauca

Collaborating Authors

Arauca

Using LLMs to create analytical datasets: A case study of reconstructing the historical memory of Colombia

Anderson, David, Benitez, Galia, Bjarnadottir, Margret, Reyya, Shriyan

arXiv.org Artificial IntelligenceSep-8-2025

Colombia has been submerged in decades of armed conflict, yet until recently, the systematic documentation of violence was not a priority for the Colombian government. This has resulted in a lack of publicly available conflict information and, consequently, a lack of historical accounts. This study contributes to Colombia's historical memory by utilizing GPT, a large language model (LLM), to read and answer questions about over 200,000 violence-related newspaper articles in Spanish. We use the resulting dataset to conduct both descriptive analysis and a study of the relationship between violence and the eradication of coca crops, offering an example of policy analyses that such data can support. Our study demonstrates how LLMs have opened new research opportunities by enabling examinations of large text corpora at a previously infeasible depth.

large language model, machine learning, violence, (21 more...)

arXiv.org Artificial Intelligence

2509.04523

Country:

North America > United States > Maryland > Prince George's County > College Park (0.14)
South America > Colombia > Bolivar Department (0.04)
South America > Colombia > Southwest Colombia (0.04)
(7 more...)

Genre: Research Report > Experimental Study (0.46)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
Government > Military (1.00)
Government > Regional Government > South America Government > Colombia Government (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

IELM: An Open Information Extraction Benchmark for Pre-Trained Language Models

Wang, Chenguang, Liu, Xiao, Song, Dawn

arXiv.org Artificial IntelligenceOct-25-2022

We introduce a new open information extraction (OIE) benchmark for pre-trained language models (LM). Recent studies have demonstrated that pre-trained LMs, such as BERT and GPT, may store linguistic and relational knowledge. In particular, LMs are able to answer ``fill-in-the-blank'' questions when given a pre-defined relation category. Instead of focusing on pre-defined relations, we create an OIE benchmark aiming to fully examine the open relational information present in the pre-trained LMs. We accomplish this by turning pre-trained LMs into zero-shot OIE systems. Surprisingly, pre-trained LMs are able to obtain competitive performance on both standard OIE datasets (CaRB and Re-OIE2016) and two new large-scale factual OIE datasets (TAC KBP-OIE and Wikidata-OIE) that we establish via distant supervision. For instance, the zero-shot pre-trained LMs outperform the F1 score of the state-of-the-art supervised OIE methods on our factual OIE datasets without needing to use any training sets. Our code and datasets are available at https://github.com/cgraywang/IELM

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2210.14128

Country:

Asia > Middle East > Iraq (0.28)
Europe > France (0.15)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
(69 more...)

Genre:

Research Report > New Finding (1.00)
Personal > Obituary (1.00)

Industry:

Media > News (1.00)
Media > Film (1.00)
Leisure & Entertainment > Sports > Soccer (1.00)
(12 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.84)

Add feedback

Language Models are Open Knowledge Graphs

Wang, Chenguang, Liu, Xiao, Song, Dawn

arXiv.org Artificial IntelligenceOct-22-2020

This paper shows how to construct knowledge graphs (KGs) from pre-trained language models (e.g., BERT, GPT-2/3), without human supervision. Popular KGs (e.g, Wikidata, NELL) are built in either a supervised or semi-supervised manner, requiring humans to create knowledge. Recent deep language models automatically acquire knowledge from large-scale corpora via pre-training. The stored knowledge has enabled the language models to improve downstream NLP tasks, e.g., answering questions, and writing code and articles. In this paper, we propose an unsupervised method to cast the knowledge contained within language models into KGs. We show that KGs are constructed with a single forward pass of the pre-trained language models (without fine-tuning) over the corpora. We demonstrate the quality of the constructed KGs by comparing to two KGs (Wikidata, TAC KBP) created by humans. Our KGs also provide open factual knowledge that is new in the existing KGs. Our code and KGs will be made publicly available.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2010.11967

Country:

Asia > Middle East > Iraq (0.28)
Europe > France (0.15)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
(87 more...)

Genre:

Personal > Obituary (1.00)
Research Report (0.81)

Industry:

Media > News (1.00)
Media > Music (1.00)
Media > Film (1.00)
(16 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.89)

Add feedback